Unsupervised methods for developing taxonomies by combining syntactic and statistical information
نویسنده
چکیده
This paper describes an unsupervised algorithm for placing unknown words into a taxonomy and evaluates its accuracy on a large and varied sample of words. The algorithm works by first using a large corpus to find semantic neighbors of the unknown word, which we accomplish by combining latent semantic analysis with part-of-speech information. We then place the unknown word in the part of the taxonomy where these neighbors are most concentrated, using a class-labelling algorithm developed especially for this task. This method is used to reconstruct parts of the existing WordNet database, obtaining results for common nouns, proper nouns and verbs. We evaluate the contribution made by part-of-speech tagging and show that automatic filtering using the class-labelling algorithm gives a fourfold improvement in accuracy.
منابع مشابه
Combining statistical and semantic approaches to the translation of ontologies and taxonomies
Ontologies and taxonomies are widely used to organize concepts providing the basis for activities such as indexing, and as background knowledge for NLP tasks. As such, translation of these resources would prove useful to adapt these systems to new languages. However, we show that the nature of these resources is significantly different from the “free-text” paradigm used to train most statistica...
متن کاملUnsupervised Knowledge Extraction for Taxonomies of Concepts from Wikipedia
A novel method for unsupervised acquisition of knowledge for taxonomies of concepts from raw Wikipedia text is presented. We assume that the concepts classified under the same node in a taxonomy are described in a comparable way in Wikipedia. The concepts in 6 taxonomies extracted from WordNet are mapped onto Wikipedia pages and the lexico-syntactic patterns describing semantic structures expre...
متن کاملCombining Supervised and Unsupervised Parsing for Distributional Similarity
In this paper, we address the role of syntactic parsing for distributional similarity. On the one hand, we are exploring distributional similarities as an extrinsic test bed for unsupervised parsers. On the other hand, we explore whether single unsupervised parsers, or their combination, can contribute to better distributional similarities, or even replace supervised parsing as a preprocessing ...
متن کاملTaxonomy Learning Using Word Sense Induction
Taxonomies are an important resource for a variety of Natural Language Processing (NLP) applications. Despite this, the current stateof-the-art methods in taxonomy learning have disregarded word polysemy, in effect, developing taxonomies that conflate word senses. In this paper, we present an unsupervised method that builds a taxonomy of senses learned automatically from an unlabelled corpus. O...
متن کاملPattern-based automatic taxonomy learning from the Web
The construction of taxonomies is considered as the first step for structuring domain knowledge. Many methodologies have been developed in the past for building taxonomies from classical information repositories such as dictionaries, databases or domain text. However, in the last years, scientists have started to consider the Web as valuable repository of knowledge. In this paper we present a n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003